ANALYSIS OF PROSPER LOAN DATA by CHUNDI HIMAKIRAN KUMAR

## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...

Prosper Loan Dataset: The original Prosper Loan Dataset contains information of loans cetgorized across 81 variables. For the purpose of this analysis we are subsetting this dataset to analyse the data pertaining to 16 variables namely ‘CreditGrade’,‘Term’,‘LoanStatus’,‘BorrowerAPR’, ‘BorrowerRate’,‘ListingCategory..numeric.’,‘BorrowerState’,‘Occupation’, ‘EmploymentStatus’,‘EmploymentStatusDuration’,‘IsBorrowerHomeowner’, ‘IncomeRange’,‘LoanNumber’,‘LoanOriginalAmount’,‘LoanOriginationDate’ and ‘LoanOriginationQuarter’.

Understanding the structure of the dataset

## 'data.frame':    113937 obs. of  16 variables:
##  $ CreditGrade              : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                     : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus               : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ BorrowerAPR              : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ ProsperScore             : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.: int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState            : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation               : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus         : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner      : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ IncomeRange              : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ LoanNumber               : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount       : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate      : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter   : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...

Understanding the range of values for each column:

CreditGrade:

levels(ld$CreditGrade)
## [1] ""   "A"  "AA" "B"  "C"  "D"  "E"  "HR" "NC"

Term in Months:

range(ld$Term)
## [1] 12 60

Term in Months Distribution:

table(ld$Term)
## 
##    12    36    60 
##  1614 87778 24545

Loan Status:

levels(ld$LoanStatus)
##  [1] "Cancelled"              "Chargedoff"            
##  [3] "Completed"              "Current"               
##  [5] "Defaulted"              "FinalPaymentInProgress"
##  [7] "Past Due (1-15 days)"   "Past Due (16-30 days)" 
##  [9] "Past Due (31-60 days)"  "Past Due (61-90 days)" 
## [11] "Past Due (91-120 days)" "Past Due (>120 days)"

Loan Status Distribution:

table(ld$LoanStatus)
## 
##              Cancelled             Chargedoff              Completed 
##                      5                  11992                  38074 
##                Current              Defaulted FinalPaymentInProgress 
##                  56576                   5018                    205 
##   Past Due (1-15 days)  Past Due (16-30 days)  Past Due (31-60 days) 
##                    806                    265                    363 
##  Past Due (61-90 days) Past Due (91-120 days)   Past Due (>120 days) 
##                    313                    304                     16

Borrower APR:

ld_without_NA <- subset(ld, !is.na(ld$BorrowerAPR))
range(ld_without_NA$BorrowerAPR)
## [1] 0.00653 0.51229

Loan Original Amount:

range(ld$LoanOriginalAmount)
## [1]  1000 35000

Employment status duration in months:

ld_without_NA <- subset(ld, !is.na(ld$EmploymentStatusDuration))
range(ld_without_NA$EmploymentStatusDuration)
## [1]   0 755

Prosper score for Loans:

ld_without_NA <- subset(ld, !is.na(ld$ProsperScore))
range(ld_without_NA$ProsperScore)
## [1]  1 11
table(ld$ProsperScore)
## 
##     1     2     3     4     5     6     7     8     9    10    11 
##   992  5766  7642 12595  9813 12278 10597 12053  6911  4750  1456

Univariate Plots Section

ggplot(aes(x = Term), data = ld) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(aes(x = CreditGrade), data = ld) +
  geom_histogram(stat = "count")
## Warning: Ignoring unknown parameters: binwidth, bins, pad

ggplot(aes(x = LoanStatus), data = ld) +
  geom_histogram(stat = "count")
## Warning: Ignoring unknown parameters: binwidth, bins, pad

ggplot(aes(x = BorrowerAPR), data = ld) +
  geom_histogram(binwidth = 0.005) +
  scale_x_continuous(breaks = seq(0,0.5,0.05))
## Warning: Removed 25 rows containing non-finite values (stat_bin).

ggplot(aes(x = EmploymentStatus), data = ld) +
  geom_histogram(stat = "count")
## Warning: Ignoring unknown parameters: binwidth, bins, pad

ggplot(aes(x = IncomeRange), data = ld) +
  geom_histogram(stat = "count")
## Warning: Ignoring unknown parameters: binwidth, bins, pad

ggplot(aes(x = LoanOriginalAmount), data = ld) +
  geom_histogram(binwidth = 1000) +
  scale_x_continuous(breaks = seq(0,35000,2500))

ggplot(aes(x = ListingCategory..numeric.), data = ld) +
  geom_histogram(stat = "count") +
  scale_x_continuous(breaks = seq(0,20,1))
## Warning: Ignoring unknown parameters: binwidth, bins, pad

ggplot(aes(x = BorrowerState), data = ld) +
  geom_histogram(stat = "count")
## Warning: Ignoring unknown parameters: binwidth, bins, pad

ggplot(aes(x = EmploymentStatusDuration), data = ld) +
  geom_histogram(binwidth=30) +
  scale_x_continuous(breaks = seq(0,600,30))
## Warning: Removed 7625 rows containing non-finite values (stat_bin).

ggplot(aes(x = ProsperScore), data = ld) +
  geom_histogram() +
  scale_x_continuous(breaks = seq(1,11,1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 29084 rows containing non-finite values (stat_bin).

ggplot(aes(x = ProsperScore), data = subset(ld,!is.na(IsBorrowerHomeowner))) +
  geom_histogram(aes(color = IsBorrowerHomeowner)) +
  scale_x_continuous(breaks = seq(1,11,1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 29084 rows containing non-finite values (stat_bin).

Univariate Analysis

TERM HISTOGRAM ANALYSIS: It is observed that all the loans are either of 12,36 or 60 months duration. The maximum number of loans have a term of 36 months followed by 60 month term loans.

CREDIT GRADE HISTOGRAM ANALYSIS: It is observed that max loans have no credit grade assigned. Among the rest ‘C’ Credit Grade loans are marginally higher than the rest grades.

LOAN STATUS HISTOGRAM ANALYSIS: It is observed that approximately 5000 loans have defaulted status and maximum loans in the dataset are current. The pst due loans are comparitveley very less.

BORROWER APR HISTOGRAM ANALYSIS: It is observed that maximum loans have been issued at an APR of 36%.

EMPLOYMENT STATUS HISTOGRAM ANALYSIS: It is observed that maximum loans have been issued to Employed people and very few loans have been issued to not employed and part time people.

INCOME RANGE HISTOGRAM ANALYSIS: The histogram is a perfect bell curve and shows that max loans have been given to the middle income groups with $25K-49K and $50K-74.99K and the loans fall off on either side as the income rises and falls.

LOAN ORIGINAL AMOUNT HISTOGRAM ANALYSIS: The most frequently sanctioned Loan amounts seems to be 4K$, 10K and 15K dollars. There are also loans sanctioned for 35K dollars.

LISTING CATEGORY HISTOGRAM ANALYSIS: Maximum loans have been sanctioned listing catgeory 1(Debt Consolidation) followd by 0(Not Available) and 7(Other). Very few loans have been sanctioned to 8 - Baby&Adoption, 9 - Boat, 10 - Cosmetic Procedure, 11 - Engagement Ring, 12 - Green Loans followed by 18 - Taxes, 19 - Vacation, 20 - Wedding Loans

BORROWER STATE HISTOGRAM ANALYSIS: Max lons have been sanctioned to people from state of California followed by approximatley equal number of loans to people from Florida, Illinois, New York and Texas.

EMPLOYMENT STATUS DURATION HISTOGRAM ANALYSIS: It is observed that max loans have been given to people who have only spent a month or two in their employment status and there is a clear downward march of loans given out decreasing as the employment status duration increases.

PROSPER SCORE HISTOGRAM ANALYSIS: It is observed that max loans have a prosper score of 4,6 and 8 while there are comparitevely very few loans having the worst score of 1 and best score of 11. We also see that maximum loans have been given to non home owners as compared to home owners across all the prosper score ratings.

Main features of interest in your dataset?

Based on the above analysis the PropserScore, BorrowedAPR and the Loan Amount seem to be features of interest

Other features in the dataset which will help support investigation into

the above feature(s) of interest?

IsBorrowerHomeowner, Term and EmploymentStatusDuration are the features which could support the investigation.

Unusual Distributions..

The employment status duration histogram was unexpected as it showed that the loans granted to people decreased as their duration in the respective employment increased. It perhaps could be attributed to the fact that max loans were given out to new employees upon getting a job within the first two months.

Bivariate Plots Section

table(ld$Term)
## 
##    12    36    60 
##  1614 87778 24545
## Warning: Removed 25 rows containing missing values (geom_point).

## Warning: Removed 25 rows containing missing values (geom_point).

## Warning: Removed 7625 rows containing missing values (geom_point).

ggplot(aes(x = (EmploymentStatusDuration%%12), y = BorrowerAPR), data = ld) +
  geom_point()
## Warning: Removed 7625 rows containing missing values (geom_point).

## Warning: Removed 7625 rows containing missing values (geom_point).

## Warning: Removed 7625 rows containing missing values (geom_point).

## Warning: Removed 29084 rows containing missing values (geom_point).

## Warning: Removed 29084 rows containing missing values (geom_point).

## Warning: Removed 29103 rows containing missing values (geom_point).

## Warning: Removed 29084 rows containing missing values (geom_point).

Bivariate Analysis

** Term vs Borrower APR ANALYSIS**: There are only three values of term ie 12,36 and 60 months. However from this scatterplot we can arrive at the conclusion that there is a wider range of Borrower APR values for the 36 months term compared to the 60 and 12 month terms.

** Borrower APR vs LoanOriginalAmount ANALYSIS**: As the Loan amount increases the Borrower APR moves to a range between 0.1 to 0.2 from 0.05 to 0.4. There seems to be a negative corelation between Borrower APR and Loan Amount.

** Borrower APR vs EmploymentStatus duration**: We see that Employment status duration has no effect on the Borrower APR

** LoanOriginalAmount vs EmploymentStatus duration**: We see that Employment status duration has no effect on the Borrower APR

** Term vs LoanOriginalAmount**: We see that LoanOriginalAmount has no effect on the Term of the loan.

** ProsperScore vs LoanOriginalAmount**: We see that majority of loans having LoanOriginalAmounts in excess of 25000 have prosper score greater than 5.

** ProsperScore vs BorrowerAPR**: There does not seem to be any relation between the proper score and the borrower APR.

** ProsperScore vs EmploymentStatusDuration**: There are very few loans with a low prosper score of 1 and duartion greater than 400 days.

** ProsperScore vs Term**: There seems to be a consistent distribution of loans with the entire range of prosper scores across the three terms.

Relationships of Interest

The relationship between BorrowedAPR and the LoanOriginalAmount is interesting and will be further investiagted in the next section. Using this relationship along with the relationships of ProsperScore vs BorrowerAPR and ProsperScore vs LoanOriginalAmount can help us derive some intelligence from the data.

Multivariate Plots Section

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
ld_by_p_score <- group_by(ld, ProsperScore)
ld.BorrowerAPR_by_P_Score_mean <- summarise(ld_by_p_score,
                          BorrowerAPR_mean = mean(BorrowerAPR),
                          BorrowerAPR_median = median(BorrowerAPR),
                          LoanAmount = mean(LoanOriginalAmount),
                          n = n())
ld.BorrowerAPR_by_P_Score_mean <- arrange(ld.BorrowerAPR_by_P_Score_mean, ProsperScore)
ggplot(aes(x = ProsperScore, y = LoanAmount), data = ld.BorrowerAPR_by_P_Score_mean)+
  geom_line()
## Warning: Removed 1 rows containing missing values (geom_path).

ggplot(aes(x = ProsperScore, y = BorrowerAPR_mean), data = ld.BorrowerAPR_by_P_Score_mean)+
  geom_line()
## Warning: Removed 1 rows containing missing values (geom_path).

Multivariate Analysis

Relationship between Mean Loan Amount and the ProsperScore: We want to find a realtionship between the mean Loan amount sanctioned and the ProsperScore. We see that there is an increasing linear relationship between the mean Loan Amount and the Prosper Score.

Relationship between Mean Borrower APR and the ProsperScore: We want to find a relationship between the Mean Borrower APR and the Prosper Score. To do this we group the loans in ld dataframe by Propser Score and the find the mean Borrower APR of each group. Plotting this we find a very close inverse linear relationship between ProsperScore and the mean Borrower APR.


Final Plots and Summary

Plot One

## Warning: Removed 25 rows containing missing values (geom_point).

Description One

The Range of Borrower APR seems to clearly decrease as the loan amount increases Loans below 5K$ have an APR range from 0.05 to 0.4 whereas loans upward of 30K$ have an APR between 0.1 and 0.2

Plot Two

ggplot(aes(x = ProsperScore, y = LoanAmount), data = ld.BorrowerAPR_by_P_Score_mean)+
  geom_line()
## Warning: Removed 1 rows containing missing values (geom_path).

Description Two

The mean Loan Amount of loans grouped by ProsperScore has a clear approximate increasing linear relationship with the ProsperScore. It can be understood logically that highly rated loans have a larger Loan Amount than loans with poor rating.

Plot Three

ggplot(aes(x = ProsperScore, y = BorrowerAPR_mean), data = ld.BorrowerAPR_by_P_Score_mean)+
  geom_line()
## Warning: Removed 1 rows containing missing values (geom_path).

Description Three

As an inverse of the previous graph we can see here that the APR mean falls with increasing Prosper Score. Again this can be understood in a way that the loans with higher Prosper SCore have a mean APR lower than the loans with poorer Prosper Scores.


Reflection

It was a difficult process to analyze this dataset. The intial difficulty lay in trying to guess the interesting variables for analysis from a total of 81 variables. The distributions of the variables helped understand the nature of the data and the distribution which in turn help guess the pair of variables that could seem to have some relationship between them. After narrowing down to the interesting relationship between Borrowed APR and Loan Amount it was decided to explore this relationship further using the ProsperScore variable. Finally we succeeded in establishing that both the Loan Amount and Borrowed APR varied in a linear fashion compared to the ProsperScore variable.